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Abstract 

We study minority games in efficient regime. By incorporating the utility 
function and aggregating agents with similar strategies we develop an effec- 
tive mesoscale notion of state of the game. Using this approach, the game 
can be represented as a Markov process with substantially reduced number of 
states with explicitly computable probabilities. For any payoff, the finiteness 
of the number of states is proved. Interesting features of an extensive ran- 
dom variable, called aggregated demand, viz. its strong inhomogeneity and 
presence of patterns in time, can be easily interpreted. Using Markov the- 
ory and quenched disorder approach, we can explain important macroscopic 
characteristics of the game: behavior of variance per capita and predictabil- 
ity of the aggregated demand. We prove that in case of linear payoff many 
attractors in the state space are possible. 

Keywords: Minority game, adaptive system, Markov process, mesoscopic 
scale 



1. Introduction 

Evolution of complex systems capable to adapt to varying environments 
by using shared memory is often considered as one of the fundamental dynam- 
ical problems in sciences. But large numbers of parameters a priori needed 
to describe them render difficult their exact analytic treatments. More effi- 
cient approaches are based on computational methods and direct modelling 
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of adaptive systems with populations of agents. In course of development of 
these models it has been soon realized that even simplified approach with 
no communication between individuals, where the only dependence between 
them is given by common memory resource, appears to be useful and in- 
teresting. Among variety of multi-agent models, the minority game (MG) 
provides with a particularly intuitive representation of self-adaption where 
individuals reason out inductively and their rationality is limited. The MG 
was originally designed in Ref. [l| to account for profitability of playing in 
opposite to the plurality of decision makers. The model has been subse- 
quently formalized in Refs. [2, 0] and became a well-established area of the 
game-theoretical, dynamical and statistical research 

The MG is a typical bottom-up construct and therefore usual definitions 
of the game first specify rules of behaviour for individuals. Then, piecing 
together microscopic variables, one defines higher-order quantities charac- 
terizing grander systems. In some cases, however, other descriptions are 
also possible, e.g. functions of state like score functions can be attributed 
to groups of agents without specifying agents individually (if. And again, 
despite an apparent simplicity of basic rules of taking decisions by agents, 
adaptive abilities and phenomenology of populations playing MGs appear to 
be surprisingly non-trivial (3, 0]. As shown in Refs. 0, 0|, phenomenology 
of MG depends qualitatively on game parameters. For example, the macro- 
scopic quantity called aggregate attendance, or aggregate demand, pooling 
together individual choices, identifies three regimes of the MG: the random, 
cooperation and herd. After the authors of Refs. 0, Si , the latter case is also 
called efficient, because the total number of strategies is small, compared to 
the number of agents, and players have access to all available information. In 
addition, in this exceptional case the relatively small number of parameters 
enables analytical solutions. 

The very first attempt of solving MG analytically was based on the 
method of statistical mechanics called the replica analysis. In order to find 
a more detailed analogy between statistical physics and MG, the group of 
Challet and Zhang p, [lo} limits their analysis to only two strategies per agent 
where the manifestation of cooperative effects is the strongest. The agents' 
choice is then treated analogously to the projection of the particle's spin on a 
quantization axis in space. The aggregate demand is split into two terms: the 
deterministic, forced by the quenched disorder, and a stochastic one that is 
further neglected. The quenched disorder term is related to systems in statis- 
tical physics when some parameters defining system's behavior are stationary 
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random variables, chosen when the system is created. Even such a simplified 
approach led to quite accurate analytical solutions for variance per capita 
as a function of the control parameter in the random regime. Additionally, 
the authors of Ref. [HI showed that properties of the MG in the symmetric 
phase depend on the initial conditions, what was confirmed numerically in 
Ref. [lij. If the initial conditions (i.e. the strategies) are drawn randomly, 
the system exhibits the so called quenched or frozen disorder. This theory, 
however, provides little knowledge on underlying dynamics of the game, i.e. 
on the evolution of utilities of strategies, and on existence of time patterns. 
An also it does not explain differences of macroscopic observables for different 
payoffs. 

Another analytical approach, based on generating functional, is offered 



by Heimel and Coolen in Ref. 13]. This is the second most used technique, 
applicable to the statistical physics and a problem of disordered systems with 
random interactions. This method is in principle exact in the limit N — > oo, 
although generally more difficult to apply than the replica procedure. The 
authors redefine the game for two strategies in such a way that instead of 
two independent utility values they operate only on one variable q combining 
these two for each agent. As a result, the generalized MG is driven by only 
three equations, where the vector q = (qi, ... , q^) represents the state, and 
N is the number of players. Then, the game is described in terms of the 
microscopic probability densities Pr(q), where the discrete-time dynamics 
is replaced by the continuous-time one. Since the state depends on N, the 
behavior in the limit iV — > oo can be examined. Similarly to the replica 
analysis, the method does not provide any insight into the game dynamics. 
Concurrently, the group of Johnson introduced the so called crowd-anti- 



crowd theory offering approximate expressions for aggregate demand [14], [15 



Agents act as a crowd if they use the same strategy. If there is a group of 
agents using simultaneously the strategy anticorrelated to the first one, they 
make the opposite decisions and are considered as an anticrowd. There exist 
many different pairs of crowds and anticrowds at the same time. If sizes of 
crowd and anticrowd are similar, as it is the case in the cooperation regime, 
then the choices of these two groups cancel mutually and the volatility is 
kept small. If the crowd dominates, the majority of agents behave in the 
same manner and the volatility becomes large. It has been demonstrated 
that, considering fluctuations of the aggregated demand, analytical results 
are consistent with the numerical ones. Following the crowd-anticrowd rea- 
soning, Jefferies et al. in Ref. [5[ cast the game into the functional map, which 
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reproduces the game when iterated. Such approach has a serious advantage 
compared to heuristically introduced rules in Refs. 0, E] , since it does not 
need to keep track of the labels for individual agents. In the definition of the 
functional map, those agents who hold the same combination of strategies are 
grouped together. In MG, individuals with the same strategies respond in 
the same way to all values of the global information set fi = {0, 1, . . . , P — 1} 
(P standing for the number of possible realizations of the winning decision 
history /i), provided that the game starts with the same initial utilities for all 
the strategies. The grouping is done using the S- dimensional tensor, where S 
is the number of strategies per agent. Assuming that the Reduced Strategy 
Space (RSS) [§] is used, rows and columns of the tensor are of length 2P 
and each entry is equal to the number of agents holding a different combi- 
nation of strategies. The concept of the state that is based on (i) utilities 
of pairwise different strategies and, (ii) history of past winning decisions, is 
subsequently introduced. Collecting above elements, a set of time-dependent 
equations, which reproduce the essential dynamics of the minority game, is 
written down. The authors figured out that MG can be interpreted as a 
stochastically disturbed deterministic system. To simplify the analysis, the 
stochastic term is skipped and attention is paid only to the deterministic 
part of the game. Then, the game is called the Deterministic MG. In the 
first studies of dynamics it is observed that the microscopic dynamics is af- 
fected markedly by the choice of the payoff function. The bahavior of the 
game is dictated by realization of distribution of agents over strategies and 
not just by specific game parameters. Hence, without knowledge about the 
disorder, the game cannot be classified to as being in either the efficient or 
inefficient regime. In Ref. [l7j, the dynamical approach is extended to the 
analysis of stochastic terms. The achieved analytical results provide correct 
explanation of variance per capita in herd regime, provided linear payoff, but 
no description of dynamics or predictabilities is given. 

There are some similarities between the crowd-anticrowd theory and our 



mesoscopic approach introduced in Ref. [18| and further develo ped in this 
article. We incorporated the same concept of state as in Ref. [l9j] for the 
step-like payoff. We found however that the linear payoff requires different 



definition [18] . In the mesoscopic approach we aggregated agents playing the 
same strategy into fractions, and treated the fraction as one player. Such 
approach allowed us to represent the game in the herd regime as a Markov 
process, regardless of the payoff. We found it crucial to incorporate the 
stochastic transitions in the model - otherwise it is impossible to describe 
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analytically the real dynamics. The mesoscopic approach was developed in 



stages, starting with Ref. [181 ] . First, we examined the system where frac- 
tions are of equal sizes. The following statements were proved: (i) the utility 
is bounded and the number of states is finite, (ii) the transition probabil- 
ities are both stochastic and deterministic. Incorporating these results we 
worked out the methodology of how to find the Markov representation of the 
process. Our analyses based on dynamics of the utility were mostly limited 
to the step-like payoff function and were technically hard to generalize. In 
addition, some important macroscopic observables, like demand variance per 
capita and predictability, were not yet analyzed and the quenched disorder 
was neglected. Here, we extend the method providing the consistent theory 
comprising different payoffs and quenched disorder. We start in section [3] 
where macroscopic differences between games with different payoffs are pre- 
sented. The theory of how to describe the game in terms of the Markov 
process is provided in section |H In many cases the explanation of macro- 
scopic observables required relaxation of the assumption about equality of 
fraction sizes and we proved that such relaxation affects transition proba- 
bilities. We found it interesting that increasing the number of players does 
not make alike systems with equal and unequal fractions, even if in the lat- 
ter case distributions of sizes are symmetric. Our analysis of the attractor 
structure of the Markov chain explains this and other dynamical phenomena 
observed in the herd regime, viz. oscillations of the aggregate attendance, its 
periodicity and predictability, or its dependence on the payoff form. These 



results are presented in section 
in time are also found in Refs. 
literature is presented in Ref. [21]. 



The numerical studies of the periodicity 



20|. More comprehensive review of the 



2. The Formal Definition of the Minority Game 

At each time step t, the n-th agent out of N (n = 1, . . . , N) takes an action 
a Q „(t) according to some strategy a n (t). The action a an (t) takes either of 
two values: —1 or +1. An aggregated demand is defined 

N 
n=l 

where a' n refers to the action according to the best strategy, as defined in 
eq. ([2D below. Such defined A(t) is the difference between numbers of agents 
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who choose the +1 and —1 actions. Agents do not know each other's actions 
but A(t) is known to all agents. The minority action a*(t) is determined 
from A(t) 

a*(t) = -sgnA(t). (2) 

Each agent's memory is limited to m most recent winning, i.e. minority, de- 
cisions. Each agent has the same number S > 2 of devices, called strategies, 
used to predict the next minority action a*(t + 1). The sth strategy of the 
n-th agent, a s n (s = 1, . . . , S), is a function mapping the sequence \i of the 
last m winning decisions to this agent's action a Q s. Since there is P = 2 m 
possible realizations of fi, there is 2 P possible strategies. At the beginning of 
the game each agent randomly draws S strategies, according to a given dis- 
tribution function p{n) : n — > A n , where A n is a set consisting of S strategies 
for the n-th agent. 

Each strategy a s n , belonging to any of sets A n , is given a real- valued 
function U a s which quantifies the utility of the strategy: the more preferable 
strategy, the higher utility it has. Strategies with higher utilities are more 
likely chosen by agents. 

There are various choice policies. In the popular greedy policy each agent 
selects the strategy of the highest utility 

a' n (t) =&rg max U a s{t). (3) 

s:<6A„ 

If there are two or more strategies with the highest utility then one of them is 
chosen randomly. The highest-utility strategy (EJ) used by the agent is called 
the active strategy, in contrast to passive strategies, unused at given moment. 
However, at any time all agents evaluate all their strategies, the active and 
passive ones. Each strategy a s n is given the payoff depending on its action 

CL(y s 

n 

^ a s n {t) = -a a s n (t)g[A{t)] } (4) 

where g is an odd payoff function, e.g. the steplike g{x) = sgn(x) [2|, pro- 
portional g{x) = x or scaled proportional g(x) = x/N. The learning process 
corresponds to updating the utility for each strategy 

Z7«- (t + 1) = + (5) 

such that every agent knows how good its strategies are. 
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3. Macroscopic observables 

Macroscopic variables are understood here as random variables resulting 
from integration of random variables defined for individuals, over subsets 
of degrees of freedom of all individuals in the system. An example of such 
variable is the aggregate demand A, defined in the previous section. In this 
section we introduce and discuss two other particulary interesting macro- 
scopic observables, viz. variance per capita and predictability. The variance 
per capita reflects the coordination between agents and is one of the most 
intriguing variables due to its nonmonotonic variation as a function of the 
control parameter N/P. Generally, variance per capita remains insensitive 
to the form of payoff function. In contrast, the predictabilities that detect 
the existence of patterns are susceptible to the payoff. Here, we demon- 
strate these phenomena paying attention mostly to the numerical results. 
The detailed analytical background is given later in section |H Finally, time 
dependencies of the aggregate demand and utilities are presented, providing 
an insight into the origin of time patterns. 

3.0.1. Observables as functions of the control parameter 

The variance per capita for given game is defined using sample taken in 
subsequent time steps during time T and assuming ergodicity of the pro- 
cess 

<Af = ^A{tf. (6) 

t=o 

The variance, considered as a function of the control parameter N/2 m , rep- 
resents a widely discussed result for MGs 0, 0] , relevant to economic appli- 
cations. For our present study it is important to note that its shape seems 
to be insensitive to form of the payoff function, as it is presented for two 
different payoffs and m = 3 and m = 7 in Fig. [TJ Similar premise for such 
payoff- independence is given by another macroscopic observable H a /N, called 



predictability, where H a 22] is defined as 



tf» = ^X>» 2 > (7) 

where (a*\[i) is the conditional average of a* given /i and the mean is calcu- 
lated over all P histories. 
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Figure 1: Variance per capita a(A) 2 /N as a function of N/P for S = 2, m — 3 (left) 
or m — 7 (right). Two different payoff functions are used; full blue lines correspond to 
g(x) = sgn{x) and dashed red lines to g{x) = x. Each point is a mean from ten games, 
error bars correspond to one standard deviation and curves are drawn to guide ones eye. 



The H a was demonstrated to be useful in detecting two interesting phases 
of the MG: 

• The symmetric phase with H a ~ 0, where after the particular history 
fi(t) both signs of a*(t) appear with the same frequency. It is often 



claimed in literature [22|, |16] that if H a = then patterns in the time 
sequence do not exist. We find this condition to be the necessary but 
not sufficient one to state the lack of patterns. For example, if every 
appearance of given /i is followed by negative and positive minority 
decision alternately then H a = and the predictable pattern exists. 
Indeed, such a behavior is observed for the MG in the herd regime and 
for g(x) = x [18l | . Hence, H a measures disproportions in frequencies 
between positive and negative minority decisions rather than detects 
patterns. 

• The asymmetric phase with H a > and existing predictable patterns. 
In the asymmetric phase, sign predictions significantly better than ran- 
dom are possible. 

As presented in FigsEJ plots of H a /N seem to be independent of the payoff 
function, similarly to cr 2 /N. 

By that means it was conjectured in early literature (cf. e.g. Ref. (23| ) 
that only the payoff's evenness is relevant to the macroscopic observables. 
Failure of this hypothesis is visible by analysing a modified macroscopic ob- 
servable, we call demand predictability, which may be useful for prediction of 
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Figure 2: Predictability per capita H a /N as a function of N/P for S ~ 2, m — 3 (left) 
or to = 7 (right). Two different payoff functions are used: full blue lines correspond to 
g(x) = sgn(x) and dashed red lines to g(x) — x. Each point is a mean from ten games, 
error bars correspond to one standard deviation and curves are drawn to guide ones eye. 



the sign of demand. This variable is defined as 



p 



fl=l 



(8) 



Plots of Ha/N (cf. Fig. [3]) exhibit its spectacular sensitivity to the payoff 
function in the effective regime, i.e. high N/P, in contrast to H a /N and 
a 2 /N. For further analysis we decompose the conditional expected values 
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Figure 3: Demand predictability per capita Ha/N as a function of N/P for S — 2, m = 3 
(left) or m = 7 (right). Two different payoff functions are used: full blue lines correspond 
to g(x) = sgn(x) and dashed red lines to g(x) = x. Each point is a mean from ten games, 
error bars correspond to one standard deviation and curves are drawn to guide ones eye. 



into components corresponding to decisions +1 and —1: 

(a* I//) = (o+|/i) + (a* \fi), 



(9) 
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where, formally 



(a*±\fi) = ^ E a * (*) 6 • * (o* (*) , ±1) , 



(10) 




(11) 



where 



(At|A*> = 7p E *M*M * (8gnA(t), ±1) 



(12) 



i=l 



The case H a = is possible if (a*|/x) = for every \i which, as seen from 
Eq. requires |(a+|/x)| = |(a*|//)|. This means that the positive and 
negative values of A(t) have to come with the same frequency. Similarly, the 
case Ha = happens if = |(A_|/j)| for every //, i.e. the positive 

and negative A mutually compensate (cf. Eq. (TTTj) ) . Combinations like (i) 
H a = and Ha > 0, and (ii) H a > and Ha = 0, are also possible. 

3.1. Observables as functions of time 

In order to examine MGs in the efficient regime, we performed a series of 
numerical simulations with different combinations of game parameters. We 
chose three representative cases: (m,N) = (1, 401), (2, 1601), (5, 1601), all 
with the number of strategies per agent S — 2. All three games are in the 
efficient mode. In the first two cases the condition NS ^> 2 P is fulfilled. In 
the third one it is not met and consequences of this fact will become clear 
later in the text. In all three experiments the full strategy space is used. 

Figs HI [5] and [6] present results for the steplike payoff function g{x) = 
sgn(x): the time evolution of A(t), the autocorrelation function R{r) and 
the scatter plots of A(t + 2 ■ 2 m ) against A(t), respectively. The same results 
for the proportional payoff function g(x) = x are given in Figsd E]and[9l 

Even a fleeting glance at Figs H] and [7] reveals regularities in A(t) for both 
payoff functions but more regular and distinct for g(x) = x. In this case 
their period increases with the memory length m and their maximal values 
are equal to the half of the population size N/2. This periodicity can be 
better seen using autocorrelation function R(r) (cf. Figs and E]) where r 
is the correlation time. The autocorrelation R exhibits statistically periodic 
peaks with 
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Figure 4: Time evolution of the aggregated demand A(t) for three combinations of the 
population size N and agent memory to: N = 401, m = 1 (left), N = 1601, m = 2 
(middle) and N = 1601, m = 5 (right). Simulations were done for 5 = 2 and g(x) = 
sgn(x) . Preferred values of A are visible for all three games. 




Figure 5: Autocorrelation function R(t) for three combinations of the population size N 
and agent memory m: N — 401, m — 1 (left), N = 1601, m — 2 (middle) and N — 1601, 
m = 5 (right). Simulations were done for S — 2 and g(x) = sgn(x). The highest values of 
R are for t = 2 • 2 m , except for t = 0, for all games fulfilling the NS S> 2 P condition. 



IF: 



Figure 6: P^ois o/ i/ie aggregated demand A{t + 2 • 2" 1 ) us. A(t) for three combinations 
of the population size N and agent memory m: N = 401, m = 1 (left), N = 1601, 
m = 2 (middle) and N = 1601, to = 5 (right). Simulations were done for 5 = 2 and 
g{x) = sgn{x). Apparent preferred levels of A(t) are seen as clusters of points. For 
to = 1 and to = 2 points tend to flock around diagonals indicating positive correlation for 
t = 2 • 2 m . 
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Figure 7: Time evolution of the aggregated demand A(t) for three combinations of the 
population size N and agent memory m: N — 401, m = 1 (left), N = 1601, m = 2 
(middle) and N = 1601, m = 5 (right). Simulations were done for S = 2 and g{x) = x. 
Preferred values of A are visible for all three games. 




Figure 8: Autocorrelation function R(t) for three combinations of the population size N 
and agent memory m: N — 401, m = 1 (left), N = 1601, m — 2 (middle) and N — 1601, 
m = 5 (right). Simulations were done for S = 2 and g(x) = x. The highest values of R 
are for t = 2 • 2 m , except for t = 0, for all games fulfilling the NS 3> 2 P condition. 



500 1000 



E00 1000 



Figure 9: Plots of the aggregated demand A(t + 2 • 2 m ) vs. A(t) for three combinations of 
the population size N and agent memory m: N = 401, m = 1 (left), N = 1601, m = 2 
(middle) and N = 1601, m = 5 (right). Simulation was done for S — 2 and g{x) = x. For 
m = 1 anrf m = 2 points tend to flock around diagonals, indicating positive correlation, 
but clusterization of points is not much pronounced. 
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periods T = 2 • 2 m , as has been already observed in the efficient regime in 
Refs. 20, . The autocorrelation is much less pronounced for games which 



do not meet the criterion NS ^> 2 P , as seen in Figs [5] and [7] (right). Relax- 
ation of this criterion spoils periodicity of the aggregated demand. Similar 
observations can be done inspecting the A(t + 2 ■ 2 m ) vs. A(t) scatter plots in 
Figs E] and M where points for games fulfilling NS ^> 2 P condition (left and 
middle panels in Figs E] and E]) are stronger flocked around diagonals. 

Another interesting feature of the aggregated demand, seen in the one- 
dimensional plots of A(t), and better in the two-dimensional plots A(t+2-2 m ) 
vs. A(t), is an existence of preferred values of A. These preferred values show 
up as species in the two-dimensional plots. The species are better focused 
and more numerous for g(x) = sgn(x) (Fig. E]) than for g(x) = x (Fig. 

Time evolution of the utility functions appears to be a strongly mean- 
reverting process, independently of the payoff function, as seen e.g. in Figs 
[TOl The more so, for the steplike payoff g(x) = sgn(x) the utility is bounded 
to rather narrow belt — 2 m < U(t) < 2 m , where here and in Fig. [TO], U(t) 
stands for the utility for any strategy. The formal proof of this statement is 
given in section |H This feature is observed for any N and S, provided the 
criterion NS 3> 2 P is met. 




Figure 10: Trajectories of the utility function U (t) for all strategies of the MG with 5 = 2 
and m — 1 and N high enough to ensure the NS 3> 2 P regime. Two payoff functions 
are shown: the steplike g(x) = sgn(x) (left) and the proportional g(x) — x (right). Lines 
correspond to all different strategies. Note difference of vertical scales between panels. 
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4. The mesoscopic perspective 



In this section we present the effective description of MG by redefining 
MG as a Markov chain. The general definition of state is found to be too 
complex for analytical treatments (cf. Sec. 14.1. ip . Fortunately, in the herd 
regime, many agents have identical sets of strategies and their aggregation 
is possible. The set of individuals using the same strategies, called fraction, 
is further treated as a single agent (cf. Sec. 14.1. 2p . All possible fractions 
exist provided the game is large enough. Knowledge about utilities of pair- 
wise different strategies and history of past winning decisions are enough to 
predict the action of any fraction. This set of parameters fully characterizes 
the system and is considered to specify its state. This definition is strictly 
suitable only for a step-like payoff function and can slightly vary for other 
payoffs (cf. Sees. PI and IP]) . 

Once the representation of the state is known, two methodologies are tried 
to explain the observations. In the simplified case we assumed the quenched 



disorder [17j, i.e. an initial random choice of the strategy set at the start of 
the game and its later fixation, and in addition equality of fractions. However, 
not all observables are properly explained by that means and an extension 
of these assumptions is needed. 

Transition probabilities can be calculated in two ways: before and after 
assignment of strategies to agents. We thus distinguish between a priori and 
a posteriori probability distribution of the aggregate demand. 

Using this approach we manage to explain all observed phenomena. Fi- 
nally, in Sec. 14.1.31 we define and study stability of this game in order to 
understand asymmetries observed in aggregate variables. 

4-1. Definitions 

4.1.1. The general concept of state 

Since the MG represents a system with many degrees of freedom, dimen- 
sionality of states is expected to be large. In general, for each time step t, 
specification of state x(t) consists of: 

A. The history of decisions /J,(t), 

B. The set of strategies of all agents {a£}^=i %, 

C. The set of utilities for all strategies of all agents {^a»(*)}n=i jv 
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D. A function relating strategies to agents: p(n) : n — > A„. 

Although the history of decisions n(t) partially stores information about the 
past of the process, transition probabilities depend only on the present state 
and the process is Markovian. 

Substantial reduction of the number of state parameters and simplifica- 
tion of state description are possible if the game is large i.e. NS ^> 2 P (cf. 
Sees. PI and 143]) . 

4-1.2. Fraction - definition and statistical properties 

All agents behaving in the same manner - the fraction - can, in a sense, 
be treated as a whole. The fraction can be defined in two ways. 

In the first approach it is a set of agents possessing a given, all the same, 
set of S strategies. The set of pairwise different strategies is denoted as 
{/3 K }^ =1 . The number of agents in the fraction u, or the size of this fraction, 
is marked as F u , where v — {1 . . . G} and G is the total number of different 
fractions. In large games, the system comprises agents of all possible fractions 
what results in constant G. In general, if strategies are assigned to agents 
randomly then F u are random variables. The strategy space consists of 2 P 
possible strategies and G is represented by the number of S-combinations 
with repetition: G = ( 2P+ / _1 ) • 

However, such definition of G makes the expected values of the fraction 
sizes, E[Fy], not equal for different fractions, provided that strategies are ran- 
domly chosen from the uniform distribution. For example, assuming S = 2, 
the fraction with two the same strategies, e.g. is two times smaller 

than fraction with different strategies fi\ and @2, where the ordering of strate- 
gies matters: { /5i , /?2 } or {/3 2 ,/?i}. Therefore in the sequel we use another 
definition: the fraction is a set of agents using given sequence of strategies. 
The fraction size is now equal to G = 2 PS . In such definition the strategy 
index s G {1, . . . , S} is dummy. Nevertheless we use this approach because 
it radically simplifies the analysis without biasing the outcome, assuming 
assigning agents to fractions with equal probabilities. For example, consider 
the case S = 2. Fractions' indexes are assigned to each pair of strategies 
arbitrarily, e.g. as presented in Tab [TJ 



2 Two strategies are called different if the Hamming distance between them is not equal 
to zero. The number of pairwise different strategies is equal to 2 P . 
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03,04 


04,01 


04,02 


04,03 


04, 04 



Table 1: One of possible assignments between fraction's indexes and pairs of strategies 



If at the beginning of the game strategies are drawn with equal probabilities, 
it corresponds to assigning agents to a specific fraction with probabilities 
£7. Assume W™ G {0, 1} is a random variable equal to 1 if agent n belongs 
to fraction v. Then F v = ^2 n =i ^™ f° nows the binomial distribution and 
Pr(F v = /„) = Hence, E[F V ] = N/G or, if normalized, 

E[F U /N] = l/G. " 

For N — > oo, we have Yar[F u ] — > oo and Var[F u /N] -> 0q This means 
that, asymptotically for large N, (i) the absolute differences between sizes 
of fractions grow indefinitely, and (ii) percentages of population assigned to 
any fraction are equal. Hence, the larger the population, the larger expected 
difference between an actual size of a fraction F v and its expected value E[-F„]. 

4.1.3. Stability 

The game is considered stable if for any strategy a n the corresponding 
utility U an (t) represents a mean-reverting stochastic process, i.e. the time- 
average of its increments vanishes after sufficiently long time. The MG has 
a build-in stabilization mechanism provided the game is large enough. The 
explanation is as follows. 

Imagine that a subset Z of strategies (Z C {0i, . . . , 02 p }) gets on average 
higher payoff than other subsets and the utilities in Z grow up. Then, there 
always exists the same number of anticorrelated strategies with decreasing 
utility. The probability that an agent uses one of the strategies with a high 
utility is 1 — (#Z/2 P ) S , compared to those who use strategies with a low 
utility (#Z/2 P ) S [18| {^Z is the number of elements in Z). Since the former 
probability is always higher, provided S > 2, then the most of population 
uses better strategies and their utility decreases, i.e. the game stabilizes. As 



3 After normalization the random variable Z™ = W™/N G {0, -k} obeys the Bernoulli 
distribution with Pr(Z™ = 0) = 1 - ^ and Pr(Z™ = -£) = ^. Hence, E[Z™] = ^ and 
Yar{Zl>] = ^(1 - Resultantly, Yar[F u /N = £^ =1 Zfi = ^(1 - i). 
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long as fraction sizes are close to each other the above mechanism works and 
the game stays stable. 

4-2. The payoff g(x) = sgn(x) 

Here, the concept of the state for payoff g(x) = sgn(x) is introduced. Ap- 
plying it allows to represent the game as a Markov process and constitutes a 
consistent basis for analytical explanations of phenomena in the herd regime. 

4-2.1. The concept of the state 

Substantial reduction of the number of state parameters and simplifica- 
tion of state description are possible in our case. Agents can use identical 
strategies. The expected number of identical strategies in the whole pop- 
ulation behaves asymptotically, for iV — > 00, like NS/2 P . The condition 
NS 3> 2 P assures that the game stays in that asymptotic regime and the 
number of identical strategies is close to its asymptotic expected value. Iden- 
tical strategies have the same utilities over the whole game, provided the 
initial values of utilities are the same, e.g. U(0) = 0, for all strategies. It 
is thus enough to take into account only reduced set of pairwise different 
strategies {f3 R } 2 K=l and utilities defined on them, and therefore B and C from 
section 14.1.11 can be reduced: 

b. {<k vza- — ► mti, 

Concerning point D, it is sufficient to find probabilities for agents to have 
strategies from the set of pairwise different strategies. The probability that 
given agent has any particular strategy from this set is equal to 1 — (1 — 1/2 P ) S . 
For large N, the expected number of agents having this strategy is equal to 
N(l — (1 — 1/2 P ) S ). Therefore point D, i.e. a function ascribing strategies 
to agents, corresponding to the agent grouping tensor f2 of Ref. 0, can be 
dropped out entirely in this case. Note that this expected number in general 
differs from the actual number, which has some consequences explained later. 

Finally, we describe states using fj,(t) and the set of utilities for the com- 
plete set of 2 P pairwise different strategies {(3 K }f. =1 : 

x{t) = [ t i{t),Ui(t),U 2 (t),...,U 2 p(t)]. (13) 

Similar description of state was used in Ref. 0] but there are two impor- 
tant differences between these two: (i) the authors of Ref. \^ introduce a 
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functional map giving time evolution of the system in any regime, and (ii) 
they degenerate the game by following mean values of demand, thus making 
the process deterministic and Markovian, and retaining possibility to ran- 
domize it perturbatively. Contrary to them, we do not degenerate the game. 
We consider it as a stochastic Markov process and eventually calculate the 
probability measure on states for the steplike payoff. 

Utilities {Up K (t)}^ =1 , considered as functions of time, are called trajec- 
tories. In the majority of cases and provided the number of observed time 
steps is large enough, strategies can be distinguished by their trajectories. 
The sufficient condition for all 2 P trajectories Up K (t) (0 < t < to) to be 
distinguishable at to is that all 2 m possible histories /i appear until then in 
a row. On the other hand, appearance of all histories /i until to, but not 
necessarily exclusively, represents a necessary condition of distinguishability 
for trajectories. Examples of MGs in the regime NS ^> 2 P are shown in 
Figs [10] where trajectories are plotted for m — 1 and S — 2, and for two 
payoff functions further studied in this paper: g(x) = sgn(x) and g(x) = x. 

4-2.2. Finiteness of the number of states 

In this section we demonstrate that for any t the utility for any strategy is 
bounded from the bottom and top: U min < U(t) < U max , where U min ( max ) = 
— (+) 2 m . At least two approaches are possible. In the first approach one 
aggregates agents using strategies of a given utility value. Another one is 
based on fractions. Here we elaborate in detail on the former one and only 
present the sketch of proof of the latter. 

Assume that at given time t two different strategies have the same util- 
ities. From Eq. §5§ for the steplike payoff function it follows that after one 
time step these utilities can either differ by two units or remain the same. 
If the initial values of the utilities at t — are the same and after r time 
steps at least one of them attains its extremal value, U m i n or U max , then the 
trajectories cover the set of 2 m + 1 values (cf. Fig. [101 left) 

U(r) € ML7 1 

= {2 m ,2 m -2,...,2,0,-2,...,-2 m + 2,-2 m }. (14) 

Using this notation we have U\ = U max and u 2 ™+i = U min . The number of 
different strategies characterized by the same u\ is given by combinatorics as 
the number of trajectories starting from and ending at u\ is 

: Up. = u{\ = f y™^, / = !,.. .,2™ + l. (15) 
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The probability that the active strategy of the n-th agent a' n has utility Ui 
is equal to 



Pr[U < (t)=u l ] 



l-Pr[C^ n (*)<«,], 1 = 

Pr [U a , n (t) < U{ _J - Pr [U a , n (t) <ui], l> 



I (16) 



Using argumentation similar to that of Ref. 17] . but extended to the full 
strategy space, one finds that 



Pr[U < (t)<u l ] = H[l-Pr[U aSn (t)> Ul \ 

s=l 

r, #{& • > m> 

L 2 P 



where, for t = r, 



3>l 



Umax 



;i7) 
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max(min) 



Pr[U a 



max(min) 



] , one sees from Eq. (TIB]) that 



Denoting Pr. 

P^max P^min- For any utility w z , different than U min or t/" max , the number of 
different strategies f fl5|) is even. Even more, a half of strategies corresponding 
to each level U min < u\ < U max suggests the opposite action than another 
half. According to Eq. (TTB]) . if two (or more) strategies have the same utility, 
then all have the same probability to be the best strategies for the n-th agent. 
This means that, if one excludes the best and the worst strategies, a half of 
remaining strategies recommends the same action as the best or the worst 
strategy. Hence the probability that an agent plays according to the strategy 
suggesting the same action as the best strategy is equal to 



Pr [c 



a a B[r 



Pr 
1 



1 , 

mux ~ ^ V. 



Pi" max P^min) 



(l "I - Pi" max Pi" min) • 



(19) 



where a B {t) is the best strategy from the whole set of strategies in the game, 



i.e. U, 



a B {t) 



Mi, and 1 — Pr Tl 



PTmin refers to the probability that the 



agent's best strategy is neither the worst nor the best of all strategies. The 
factor | reflects that a half of strategies with non-extremal utilities suggests 
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the same action as the best one. As Pr max > Pr min , from Eq. (IT!?]) it fol- 
lows that if one of strategies has the utility U max , then more than half of 
the population plays according to the best strategy Subsequently, this sub- 
population loose and gets the negative payoff. The rest are the winners and 
get the positive payoff. This mechanism bounds the utility to stay between 
U m i n and U max . In addition, we know the formula for the fraction of agents 
playing the same action. For example, if S = 2 and m — 1 then Pr max = ^ 
and Pr min = i. Hence, E[A] = |JV. 

The analogical results are achieved when the concept of fraction is used. 
The number of different strategies characterized by the levels u\ follows 
Eq. (115]) . Additionally, for all intermediate levels U min < u\ < U max there 
exists the same number of strategies that suggest +1 and —1. Hence, all 
fractions that use one of these intermediate strategies compensate on average 
their mutual decisions. The last point is to find the number of fractions that 
use the best and the worst strategy, which are equal to 2 PS — (2 P — l) s and 
1, respectively. For example, for S = 2 and m = 1 there are G = 2 PS = 16 
fractions: seven using the best strategy and one using the worst one. Hence, 
K[A] = j^N — = | AT, in compliance with the previous example. 

4-2.3. The Markov process representation 

The MG can be described in terms of the Markov process with the finite 
number of states. The sgnA(xj) fully defines the utility and fi values of the 
next state and takes ±1. But in some specific states A(xi) is always positive 
or negative and only one value of sgnA(xj) appears. Hence, the transition 
may be either stochastic or deterministic and the transition probability is 
equal to 



4 The lack of explicit dependence of Pr(xj\xi) on Xj in Eq. ([20]) does not mean that 
both transition probabilities are the same for stochastic transition. They can be different 
for asymmetric distribution of A(xt) (cf. discussion in sec. 4.4.1 below). 





(21) 
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where C v (xj) G [—1, 1] is a common action of all members of the fraction v 
in the state Xi 

W4t^w ( 22 ) 

n=l 

In other words, C v {xj) represents the aggregated demand per capita within 
fraction v. The C„(xj) depends on the action suggested in the state Xi by 
the best strategy, or strategies, of the v-th. fraction. 
There are the following groups of fractions: 

• Fractions with only one best strategy in the state X{. All agents in the 
fraction react according to this strategy. 

• Fractions with many best strategies where all best strategies in a given 
fraction suggest the same action in the state x^. Although agents use 
different strategies, they all react identically. 

• Fractions with many best strategies, where for each fraction some of 
the best strategies suggest the opposite action than another ones in 
the state av Actions of agents are thus inhomogeneous and an over- 
all action of such fraction is a random variable C v {xj), taking values 

= ~ Fv + 2ip for ip = {0, . . . , F v }, where ip represents the possible num- 
bers of agents acting —1 in the fraction v. This distribution depends 
on a proportion between best strategies suggesting opposite actions. 
Assuming there is p + ^ strategies suggesting the positive (negative) 
action, the C v [x,i) obeys the binomial distribution 

^-O-0(??f)'(5^fH (23) 

where E[C v ( Xi )] = -1+2 (j^) and Var[C v ( Xi )] = i(^) (^)- 

Fractions from the first two groups and suggesting +1 are marked with 
d, suggesting —1 are marked with q, and those belonging to the third group 
are indexed with w. Hence, Eq. (l2Tj) transforms into: 

M*) = E F *- E F « 

F d :C d (x z )=l F q :C q (x t )=-l 

+ ^ C w (xi)F w , 

F w :C w (xi)e[-l,+l] 



(24) 



21 



where 

Further analysis is relatively easy when fractions are of equal sizes and it 
complicates if their sizes are random. 

The case of equal-size fractions 

The system with the same numbers of agents per fraction we call the reference 
system and the corresponding MP - the reference MP. The A{xj) is a random 
variable which can be expressed as: 

A(x t ) = ^{D-Q+ Yl C w (xi)), (26) 

F w :C w (xi)e[-l,+l] 

where D and Q refer to the total numbers of fractions from the first two 
groups suggesting +1 and —1, respectively. If the state is deterministic then 
the components with opposite signs do not compensate and 

| J D-Q|>max( £ C w { Xi )). (27) 

F w :C w (xi)e[-l,+l] 

In the limit NS — > oo, inequality ( 1271) is satisfied always when the negative 
and positive components are unbalanced, i.e. D ^ Q. This can be proved at 
least in two ways. 

The general proof uses the strong law of large numbers where the sample 
average C w (xi) converges almost surely to the expected value, i.e. 

Pr( lim C w ( Xi ) = E [C w ( Xi )] ) = 1. (28) 

Each E [C w (xi)] is equal to zero. Therefore the sum over F w : C w [Xi) G 
[—1, +1] is equal to zero as well. 

Another approach is applicable not only in the limit and requires sep- 
arate analyses per state, as given in Example 2. For stochastic transitions 
there is always D = Q. For such states, A(xi) has distribution symmet- 
ric around zero, ensuring that also distribution of sgn A(xi) is symmetric 
and E[sgn v4(xi)] = 0. Thus, transitions to two following states are equally 
probable. 

Knowing how to distinguish the deterministic and stochastic states, the 
algorithm of defining the MP is the following: 
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1. Consider all 2 m initial states. Such states are characterized by Up H = 
for all strategies k and different histories //. Due to equality of all 
strategies, two minority decisions are equally possible for each of initial 
states and the transition is stochastic. These minority decisions deter- 
mine strategies that get positive or negative payoff. The updated U 
and \i values determine 2 m+1 next states. 

2. If, in the next state, there are many best pairwise different strategies 
suggesting opposite actions, then D = Q and, again, two minority 
decisions and two successive states are possible, and the transition is 
stochastic. Hence, two next states have to be determined. 

3. If, in the next state, there are many best pairwise different strategies 
suggesting the same action, then D ^ Q and the minority decision is 
determined by this action, and transition is deterministic. 

4. If there is only one strategy characterized by the highest value of the 
utility, then D ^ Q and the minority decision is determined by the 
best strategy, and transition is deterministic. 

Here we illustrate how one can find subsequent states and their transition 
probabilities using the algorithm presented above (Example 1). Next, in the 
Example 2 we show how to check step-by-step that the transition is stochas- 
tic/deterministic assuming finite number of agents. 

Example 1: transition scenarios for m = 1 case 

An example realization of the A(t) for the reference MP is given in Fig. [IT] 
(upper left). The estimated ^-distribution is symmetric (upper right) and 
the distribution of sgn(A) is symmetric likewise (lower right )□• The scatter 
plot of A(t + t) as a function of A(t), where r = 2 m+1 , indicates periodicity 
and existence of preferred values of A (lower left). In this case the complete 
specification of states and calculation of the transition matrix are relatively 
easy. All strategies are listed in Tab. [2J Possible transition scenarios for 
the m = 1 MG, represented as the Markov chain, are illustrated in Figs [121 
At the beginning of the game all utilities are equal to zero. Depending 
on the history /i, only two initial states can exist: x\ = [—1,0,0,0] and 
x 2 = [1, 0, 0, 0]. For each of these two states two further scenarios are equally 
possible, because the utilities of corresponding strategies are the same. The 



5 Small asymmetries visible in Fig. [TT] are due to finite number of samples used for 
estimation. 
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Figure 11: Time evolution of the aggregated demand A(t) (upper left), Plots of the ag- 
gregated demand A(t + 2 • 2 m ) vs. A(t) (lower left), Estimated Pr(A) (upper right) and 
Pr(sgn(A)) (lower right) for the population size N = 400 and agent memory m = 1, S = 2 
strategies per agent and identical sizes of fractions. 
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Table 2: Strategies for m = 1 

choice depends on the ratio between numbers of agents in two groups: one 
with a = 1 and another one with a = — 1. These scenarios are as follows. 



Transition 1 



Being in the state xi, the majority of agents use strategies suggesting 
a = —1. Then 



— the minority action in the next step is a* = 1, 

— strategies (5i or f3 2 give negative payoff, 

— strategies (3^ and /?4 give positive payoff. 
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Transition 1 Transition 2 Transition 3 Transition 4 

x(t): x 3 x 5 x 4 x(t): x 3 x 7 x 9 xi 

n(t): 1 -1 -1 n(t): 1 1-1-1 




Figure 12: Trajectories of utilities for m = 1. 

The system goes to the state x 3 = [1, —1, —1, 1, 1] (cf. Fig. [TJl Tran- 
sition 1) where Up 3 = Up A = 1 and these strategies suggest different 
actions on the last history /i = 1. Similarly, there are two strategies 
with the utilities Up 1 = Up 2 = — 1 suggesting different actions on \x = 1. 
Hence, there are two equiprobable scenarios, further described as Tran- 
sitions 3 and 4. 

Transition 2 

Being in the state x±, the majority of agents use strategies suggesting 
a — 1. Then 

— the minority action in the next step is a* = — 1, 

— strategies f3s or /3 4 give negative payoff, 

— strategies j5\ and (3 2 give positive payoff. 

The system goes to the state Xn = [— 1, 1,1, —1,-1] (cf. Fig. \12\ 
Transition 2) where Up x = Up 2 = 1 and give the same actions on the 
last history fi = —1. Most of agents use these strategies (e.g. 3/4 of 
the population, provided S = 2) and the sole possibility is that the 
system goes to the state x 2 - 
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Transition 3 

Being in the state X3, the majority of agents use strategies suggest- 
ing a = 1 and the system passes to 25. In this state Up 3 = U max and 
Ufa = Umin (cf. Fig. [121 Transition 3). According to the reasoning from 
section 5.1, if one utility attains its maximal or minimal value, most 
agents use strategies suggesting the same action as the best strategy. 
Consequently, there is only one scenario possible in x$: the best strat- 
egy, and all strategies giving the same output as the best one, loose 
and the system goes to the state X4. 

Transition 4 

Another possibility in x 3 is that most of agents decide a = — 1 and 
the system goes to x-j. In this state Up 4 = U max and Up 1 = U m i n 
(cf. Fig. [12} Transition 4). Subsequently, the best strategy, and all 
strategies giving the same output as the best one, loose and the system 
goes to the state x 9 . In x 9 both best strategies suggest the same for 
the last history fi = — 1. The majority of the population uses one of 
these best strategies and the system moves to X\. 

Transition 5-8 

These transitions are analogical to Transitions 1-4, but the initial state 
is x 2 . 

The states are listed in Tab. El These states and transitions are sufficient 
to define a memoryless representation of the MG with a transition graph 
displayed in Fig. [13j Some of its states have the same expected demand 
E [A] over realizations of the game, e.g. E [A(a;j)] = (i — 1, ...,4), since the 
same numbers of agents play according to strategies recommending opposite 
actions. Using formulas f lTMTBj) we can find E [A] for all states (cf. Tab. E]), 
consistently with observations in Fig. [UJ where five clusters on the diagonal 
are found around values from Tab. [3j Our process is a stationary Markov 
chain for which the stationary Master Equation can be solved with respect 
to the state probabilities. Their values are given in Tab. El in the column 
marked Pr(xi) (i = 1, . . . , 12). The state probabilities from Tab. El can be 
also used to find statistical periods of the demand 

Pr[A{t) = A{t + r)] = ^ S[A{ Xj (t + t)), A( Xi (t))] 

ij 

■ Prlxjit + ^lxiit^-Prlxiit)], (29) 
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Table 3: States Xi {i = 1, . . . , 12), their probabilities Pr(xi) and demands for m — 1. TTie 
E stands for the expected value of A for the state x%. 

where 5(x, y) stands for the Kronecker symbol. The maximal value of 7/16 
is found for r = 4 and this explains why the largest correlation is found also 
for r = 4 (cf. Figs and ED . 

Example 2: Deterministic transitions 

Here, we show an example how to prove that the transition from a given state 
is deterministic provided that the system is a reference one and the game is 
in herd regime but not necessarily in the limit NS — > oo. Additionally, 
we present that the transition can change if agents are assigned to fractions 
randomly. 

Let us consider an arbitrarily chosen state for S = 2 and m = 1 where the 
transition is deterministic, e.g. x 5 defined as x 5 = [—1,0,-2,2,0]. Assume 
that fractions' indexes are assigned to each pair of strategies according to 
Tab. [TJ Analyzing each fraction one finds that: 

• For fractions F n , F 12 , -F 15 , F 16 both strategies suggest +1. Hence C v (xi) = 
+1, for v e {11,12,15,16}. 
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Figure 13: Diagrams of the Markov chain representation of the MG in the efficient regime 
for m = 1. Numbers in circles represent the following states: x\ = [—1,0,0,0,0], X2 — 
[1,0,0,0,0], x 3 = [1,-1,-1,1,1], x 4 = [-1,1,-1,1,-1], x 5 = [-1,0,-2,2,0], a; 6 = 
[1,0,-2,2,0], x 7 = [1,-2,0,0,2], x s = [-1,2,0,0,-2], x 9 = [-1,-1,-1,1,1], x w = 
[1,1,-1,1,-1], xn = [—1,1,1,-1,-1], X12 = [1,-1,1,-1,1]. States marked as grey 
incorporate ji = — 1 while the white ones fi = +1. Values assigned to arrows reflect 
transition probabilities. Two cases are shown: equal fractions (left) and unequal ones where 
agents draw strategies with uniform probability (right). In the case of equal fractions, if 
transitions to two states are possible from a given state, both transition probabilities are 
the same. 



• For fractions F 3 , F 7 , F$, Fg, Fi , Fu strategy with higher U suggests +1. 
As a result for these strategies C v (xi) — +1, for v e {3, 7, 8, 9, 10, 14}. 

• In fractions F 1 ,F 2 ,F 5 ,F 6 both strategies suggest —1. Thus C v {xi) = 
-1, for v E {1,2,5,6}. 

• Finally, fractions _F 4 , F 13 have two strategies with equal probabilities 
but suggesting opposite actions. Hence, C„(xj), for v e {4, 13}, follows 
binomial distribution ( 123]) . 

For the reference system (equal fractions) one can calculate E[A(a;5)] = j^N. 
The uncertainty is introduced by agents belonging to fractions F 4 , iq 3 because 
they choose —1 or +1 with the same probability. It means that A(xs) G 
{^jiV. . . YgiV} and War[A(x5)} = N/8. Hence, A(x$) is always positive and 
a*(x 5 ) = —1, thus the successor state is determined unambiguously. Such 
analysis can be performed for arbitrary state which makes easy calculation 
of variance of the aggregate demand (cf. Tab. [3]). 

Any MG with m > 1 in the efficient regime can be represented as a 
Markov process with a finite number of states. The same method as for 
m = l, but more demanding computationally, can be used to calculate state 
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probabilities. The reasoning presented is strictly true only in the ideal case 
where subpopulations of agents in different fractions are equal, or if the sys- 
tem is considered a priori, i.e. before strategies are assigned to agents at the 
beginning of the game. In a posteriori analysis we consider the game where 
strategies are already assigned. In most cases such game is characterized by 
an inequality between sizes of fractions due to the initial randomness in the 
strategies' generation process (quenched disorder). In Example 2, consider- 
ing system a priori, the expected value EL4(x5)] remains the same but the 
variance changes distinctly enough to allow for appearance of negative sam- 
ples. Considered a posteriori, also E[A(xs)] is most likely biased compared to 
EL4(x5)]. We show that some interesting phenomena, among them the sensi- 
tivity of the predictability Ha to the payoff, appear only when the quenched 
disorder is taken into account i.e. imbalance between fractions exists. 

The case of unequal-size fractions 

If strategies are assigned randomly to agents then fraction sizes are likely 
to be unequal. Let us consider one of the simplest cases where strategies 
are drawn with equal probabilities, which corresponds to assigning an agent 
to any fraction with the probability ^. Interestingly, numerical experiments 
show that in this case the reconstructed MP usually follows the sequence of 
states of the reference MP but the values of transition probabilities are not 
reproduced. This bias does not disappear even if the game is enlarged (see 
Figs ITT1 and UM . The explanation is as follows. 

States in the reference MP, where stochastic transition appears, are char- 
acterized by the same number of positive and negative components in formula 
f f24|) . Calculating the transition probability we considered two cases: before 
and after assignment of strategies to agents, i.e. the a priori and a posteriori 
one. 

Calculating a priori expected value of sgn/1 we do not know yet the 
specific number of agents in the v-th fraction and we just operate on random 
variables: 



E[sgnA(x l )} = E sgn^a^F, 



V 



V 



E sgn( Y, F * 



F d :C d (xi)=l 



F q :C q (xi)=-l 



+ 





(30) 
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Figure 14: Time evolution of the aggregated demand A(t) (upper left), Plots of the ag- 
gregated demand A{t + 2 • 2 m ) vs. A(t) (lower left), Estimated Pr{A) (upper right) and 
Pr{sgn(A)) (lower right) for the population size N = 400 and agent memory m = 1, S = 2 
strategies per agent and unequal sizes of fractions. 



Each fraction size F v obeys the same binomial distribution. Since we consider 
stochastic transitions in the reference system, then there is the same number 
of elements in the first and second sum of Eq. f )30|) . The distribution of the 
third sum is symmetric around zero because it contains pairwise symmetric 
components. Thus, the distribution of A(xi) is also symmetric, as well as the 
distribution of sgnA(xi). By that means E[sgnA(xj)] = 0. 

When strategies are assigned to agents, then the numbers of agents in 
fractions, f u , are known and the system is considered as a posteriori. The 
E[sgn v4(xj)] can be decomposed: 

E[sgnA(x,,)] = E[sgn( J] f d - £ /, 

fd-C d (Xi) = l f q :C q (Xi) = -l 

+ Yl C w ( Xl )f w )]. (31) 
fw-c w {xi)e{-i,+x\ 



Provided S — 2, the last sum in Eq. ( 131]) is symmetric around zero due to 
C w symmetry but the first two sums introduce a bias, shifting distribution of 
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A(xi). If S > 2 then also the third term may be biased. Since Yar[F] — > oo 
for N — > oo, then considering only first two components one gets 

Var[ ^ F d - E ^]^°°' ( 32 ) 

F d :C d {xi)=l F q ;C q {xi)=-\ 

which means that the probability of a large bias grows indefinitely with N. 

If the a posteriori distribution of A{xj) is shifted then the a posteriori 
distribution of sgn A(xi) is asymmetric, regardless of a symmetry of the last 
term. Consequently, most likely E[sgnA(a;j)] 7^ 0. The equality between 
E[sgn y4(xj)] calculated using a priori distribution and E[sgny4(xj)] calcu- 
lated using distribution a posteriori occurs only if numbers of agents per 
fraction are equal for all fractions. In other cases the expected absolute bias 
of distribution increases with N and probabilities in stochastic transitions 
are most likely unequal. In some experiments we found that for specific 
states the bias can shift the distribution so heavily that it is always posi- 
tive or negative. Therefore the state, being a priori stochastic, may become 
deterministic when analyzed a posteriori. 

Finally, consider the states with deterministic transitions in the reference 
system. If now F is a random variable, then with some, usually very small, 
probability the transition becomes stochastic due to the specific realization 
of F. The analysis of one specific state is given in Example 2 of the present 
section. 

4-2.4- Stochasticity of the game depends on initial conditions 

We assumed that U a s(t = 0) = for all a s n . This assumption seems nat- 
ural as reflecting no a priori preference for any strategy. However, it appears 
to be critical for the MG dynamics for g(x) = sgn(x). Stochastic transitions 
show up for the degenerate state, i.e. with more than one strategy with 
the same utility. Removing this ambiguity suppresses stochasticity and the 
game becomes deterministic. In such a case, our simplified description of 
the state fails because strategies have unique utilities and cannot be aggre- 
gated. Consequently, the Markovian treatment, as presented in Sec. 14.2.31 is 
no longer useful but its description in terms of the Markov process, defined 
as for proportional payoff g(x) = x, becomes interesting. In particular, the 
game follows the Eulerian path on de Bruijn graph and is deterministic (cf. 
Sec. EM- 
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4-2.5. Stability of the game and behaviour of the predictability H 

Disproportions in fractions affect transition probabilities. If the absolute 
disproportions are very large then some transitions, which exist in the refer- 
ence system, can disappear and the graph is reduced to its subgraph. The 
game remains stable because each subgraph is characterized by sequence of 
states assuring that +1 and —1 appear after given /i with the same frequency 
(cf. white and grey circles, respectively, in Fig. [13]). Equality of frequencies 
of the opposite minority decisions after any //, is both the necessary and 
sufficient condition to assure stability, provided g(x) = sgn(x). Hence, the 
stability entails the same frequencies, resulting with (a*|/i) = 0. No matter 
whether the system is the reference one or not - the H a is always equal to 
zero, provided the game is stable. The above mechanism works as long as the 
game is deep in herd regime, i.e. NS > 2 P , and if strate eries are drawn from 
the uniform distribution or the one close to it. If game moves to the cooper- 
ation mode, or strategies are drawn from an asymmetric distribution, then 
the methodology of MP breaks down because relative disproportions between 
fractions are large. This distorts stability and additional states appear. 

The stability mechanism requires balance between frequencies of the neg- 
ative and positive signs of A after any fi, regardless of the value of A. The 
(A\fi) in formula ([8]) can be redefined as follows: 

(A\ri*Y, E l A W)] Pr W)> ( 33 ) 
i=i 

where X M is the set of all states x% including history /i. Approximation fl3"31 
is based on replacing each partial sum of random variable in state xf, A(x^), 
by its expected value in this state, E Eq. (1331 is strict in the limit 

of infinite time, T — > oo. 

Analyzing the system a posteriori, ^^^^ E [A(x^ )}Pr(x^ ) = only in 
the case of equal fractions, because there always exists a pair of states with 
the same /i, the same probabilities and symmetric distributions around zero. 
The larger the game, the larger possible disproportions of E )] between 
the reference and the real system, provided that in the real system strategies 
are drawn from flat distribution. As a result, E [A(a;f )]Pr(xf ) grows 

with the population size. Hence, Ha as a function of the control parameter 
n = N/P is larger than zero in the herd regime, if the system is different 
than the reference one. 



32 



4-2.6. Variance per capita a 2 /N 

For simplicity, we consider here only the case of equal fractions and do 
not distinguish between a priori and a posteriori games. The variance per 
capita (JBj) is defined using the sum over the set of all states X. If game is 
large enough, then a suitable approximation based on the MP representation 
is given by 

#x 

a 2 (A) ~ ^Pr(x t )E[A(x,)] 2 (34) 

i=l 

* X f N - G 



^Pr(^) -E[^a(x,)] . (35) 




i=l \ u=\ 

In derivation of Eq. (1341) from Eq. ([6]) we use expansion of cr 2 (A) into the 
sum of partial sums over states and the fact that variation of E [A (a;*)] from 
state to state is significantly larger than the width of distribution of A in 
any state (cf. Figs [11] and [HI upper right). More detailed explanation is as 
follows. 

In Eq. (jHJ), each value of demand A(t) is generated in one of K possible 
states. Assuming ergodicity, the sum over time steps t in Eq. (jHJ) (t = 
0, . . . , T) can be represented as a sum over all T visits in states x k (k = 
1, . . .,K and K = #X). Since each state is visited many times, the sum 
over visits in states can be decomposed into partial sums over states 

-(A) 2 = ^Ait) 2 
t=o 

K 



_ _ 1 _ K _ 

k=i k i k =i 

K 

J2 E iA(x k )% (36) 



fc=i 



where ik runs over subsequent moments when the system is in the fc-th state 
and Ik stands for the number of visits in this state. For any state Xk the 
random variable A(xk) can be represented as a sum 

A{x k ) =E[A{x k )]+r](x k ), (37) 
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where t](x k ) is a random variable and E [77(2^)] = 0. Hence 

E [A(x k ) 2 } = E [A(x k )} 2 + E \v(x k ) 2 ]. (38) 
Since, depending on the state (see Tab. [3]), 

E[A(x k )} 2 ~ or iV 2 , 

E[q(x k ) 2 ] ~ JV or 0, (39) 

(40) 

the second term in Eq. ( 138]) may be neglected for large N and one arrives at 
Eq. (EH). 

In order to guide intuition, let us consider example from Fig. [LTJ (upper 
right). This joint distribution of A is a sum of distributions for twelve states. 
Five distinct peaks correspond to distributions of A in groups of states. States 
corresponding to peaks, as well as expected values of A 2 and t] 2 , are given in 
Tab. H 



Peak (from left) 


States x k , k = 1, .., 12 


E[A(x k )] 2 


E[ V (x k ) 2 ],k = l,..,12 


1 


^11,12 


(-N/2) 2 





2 


^8,6 


(-3N/8) 2 


~ N 


3 


^1,2,3,4 





~ N 


4 


%5,7 


(3iV/8) 2 


~ N 


5 


^9,10 


(N/2) 2 






Table 4: Squared expected values of A and expected values of if 2 for five peaks seen in 
Fig. \11\ (upper right). 

The number of fractions where all strategies suggest the same action after 
given fi is always (2 P_1 ) 5 , where 2 P ~ 1 represents the half of the strategy 
space where all strategies suggest the same action. Hence, at least, 2{2 P ~ 1 ) S 
terms in C v in the sum (|34|) compensate mutually. By that means there is 
2 PS — 2( p ~ 1 ) 5+1 terms which in the worst case are not compensated. Indeed, 
one can find states where all actions of these fractions are equal to +1 or —1, 
but also states where contributions of all fractions compensate to 0. Hence 

G 

< I 52c v (xi)\ < 2 PS - 2^ s+ \ (41) 
i=i 
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where the upper boundary can be factorized into 2 P5 (1 — 2 1 ~ 5 ), and only 
the number of different fractions G = 2 PS depends on P. In particular, for 
S = 2, 3, 4, 5 this factor is equal to y , |G, |G, j^G, respectively. 
Generally: 

E [A{xi)\ ~ N. (42) 
As a result, in Eq. (134)) . a 2 ~ iV 2 and o~ 2 IN ~ -/V, in agreement with numerical 



simulations |7| and theoretical results [17J, [14J, [15 



The variance is no longer proportional to iV 2 if game leaves the herd 
regime. In the random mode, there is less agents than fractions and therefore: 



N 



<T 2 (x i ) = ^Var[a < (x i )]. (43) 



n=l 



Considering further the case S = 2, on average the half of agents do not 
have choice because they have two strategies suggesting the same action. 
Decisions in this half of the population compensate mutually and do not 
influence A. There are states where the rest of the population has a choice 
and thus cr 2 (xi) = X^=i^ ar [ a <( x «)]- Hence, a 2 ~ — . 

In the cooperation regime, most of fractions are in game but fluctuations 
of F are still relatively large. Thus, there are fractions more and less pop- 
ulated. Strategies that are in less populated fractions win more frequently. 
The impact of these fractions is compensated by larger fractions and there- 
fore the variance is minimal. It reflects the balance between the crowd and 



anticrowd in the so called crowd-anticrowd approach [17 . 



4.3. The payoff g(x) = x 

The linear payoff g(x) = x requires different methods of analysis than the 
steplike one. For g(x) = sgn(x), in each state there are strategies suggesting 
different actions with the same utility. If an agent has two or more best 
strategies with the same utility then it chooses one of them randomly. As a 
result, some transitions are stochastic. The more so, the utility is bounded 
from the bottom and top: U min <U(t) < U max , where U min{ , max) = -(+) 2 m . 
The number of values of utility is relatively small. For g(x) = x, the prob- 
ability that the pairwise different strategies have the same utility is small, 
compared to the case of g(x) = sgn(x), and the range of possible U is much 
wider, from —N/2 to N/2, provided that the system is the reference one. Re- 
sultantly, stochasticity of transitions disappears almost completely but the 



35 



game is still periodic. A persuasive explanation of periodicity is proposed 
by the authors of Ref. ji[ using de Bruijn representation of the memory se- 
quences /i. Here we extend their analysis and explain the dynamics of A(t) 
by introducing a novel definition of the state. 

4-3.1. The initial phase 

All steps with more then one strategy with the same utility are called 
initial. If U(t = 0) = const for all strategies, then some initial steps are 
necessary to split all utilities of pairwise different strategies. Now we show 
that the minimal number of such steps is 2 m and the maximal is 2 m+1 . 

Identical utilities of two different strategies at time t can either differ by 
2A(t) or remain the same at t + 1. They differentiate when corresponding 
pairwise different strategies suggest opposite actions after Therefore 
the shortest time to split utilities of all 2 P strategies is 2 m . Such scenario 
requires appearance of all possible histories \i without any repetitions. 

If strategies react in such way that their utilities do not split from step 
t to t + 1, then it means that the same /i appears twice. Resultantly, the 
strategies that won in step t have to lose in step t + 1, due to the positive 
change of the utility and being preferable to the majority of the population 
at time t + 1. Thereby the sign of A(t + 1) changes, compared to the sign 
of A(t), and different [i has to appear. There is only one /i for which given 
half of different strategies reacts identically and for any other /i they have to 
split. Example of both scenarios is presented in Fig. [Jj)J where strategies are 
defined as in Tab. |2J The first scenario is relatively easy to follow and we 
focus on the second one. The initial value is fi(0) = 1 and all strategies have 
the same utility U = 0. Each agent at t — draws one strategy randomly. Let 
us assume that most of them decide to use the strategy suggesting a(0) = — 1. 
As a result (i) /3 2 and /3 4 get positive payoff and, (ii) the next history is fi = 1 
(cf. Tab. 1). After fi = 1 both winning strategies suggest the same action 
and lose. So the next history is /i = —1. Since the history changed, the 
glued strategies have to react differently because two different /i's cannot 
cause the same reaction of all strategies. Thus, the longest time to split all 
U trajectories is 2 m+1 and requires every possible history to appear twice. 

4-3.2. The concept of the state 

At any step of the game one can rank all pairwise different strategies as 
the best, second best, third best, etc. Sizes of fractions corresponding to 



these strategies are known [17|, [18|. An ordered list of indexes of different 
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H(t): 1-11 11-1-1 




x(t): +3142 x(t): -3142 



Figure 15: The shortest (left) and longest (right) scenario of the initial phase for m = 1 
and /i(0) = 1. 



strategies, complemented by ji value, is sufficient to fully describe the game 
at a given moment and can be used as a characteristics of the state. Formally, 
assume {f3 K } 2 K=l is the set of pairwise different strategies indexed arbitrarily. 
There exists the sorting operator u(k) — > I, ordering strategies according to 
their utilities, such that lp k stands for the position of the strategy (3 K in the 
ordered list. Then the state is as follows 

x(t) = [ f i(t),l l3l (t),...Jp 2P (t)}. (44) 

The total number of states is equal to P Y^kJq 1 {^ P — 2k) and accounts 
for all possible orders of P strategies, provided each strategy has its anti- 
strategy 0, where the pair consisting of the strategy and its anti-strategy is 
characterized by the normalized Hamming distance equal to one. 

As prevalent number of strategies have unique utility, t he p robability ([TBI) 
for the active strategy a' n can be simplified (cf. also Ref. T3 ) 



Prp^it) = «J = (l - l -^Y - (l - ^)", I > I- (45) 



6 First arbitrarily chosen strategy from the set of 2 P strategies can be placed on one 
of 2 P positions in the ordered list. When the position of the given strategy is chosen, 
then the position for its anti-strategy is chosen automatically. Next, the strategy from the 
reduced set of 2 P — 2 strategies is placed in one of 2 P — 2 positions, and so forth. Each 
level occurs with different fi and there arc P different /it's. 
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As a result, in the limit iV — > oo about N Pr [U a ' n (t) = uj\ agents use the Z-th 
best strategy. Subsequently, analysis of actions of strategies provides values 
of the aggregate demand in each state. Consider, for example, the case when 
A is the largest possible. Since {ui} is a sorted list of utilities, this is possible 
if the first 1/2 strategies in this list suggest actions opposite to the last 1/2. 
Then the probability of an action suggested by the best strategy is equal to 

2P- 1 

Pr [a <(t) = a aB[t) ] = ^ Pr [U <(t) = wj 

1=1 

= ^ 

This means that for large NS for about N(l — ^&) agents their active strategy 
is the same as the best strategy and the absolute value of the aggregated 
demand is equal to 

\A\ = N(l--±-). (47) 



2 s-i 

In particular, if S = 2 then \A\ — N/2. 

4-3.3. De Bruijn representation 

We know that U trajectories represent mean-reverting processes. Thus, 
the state space ( )44|) is projected onto the subspace x(t) = /i(t) and the 
dynamics of the MG can be efficiently studied using de Bruijn graphs, as 



shown in Ref. 241 ] . The decision history is a sequence of m minority 
actions 

/x(t) = [ a *(t — m),a* (t-m + l),...,a*(t-l)]. (48) 

The fi(t + 1) is obtained by adding a*(t) to the right and deleting a*(t — m) 
from the left of the vector ( )48|) . such that there are two possible successors 
[i(t + 1) of yu(t). If one history can be obtained from another one using this 
procedure, then the latter has a directed edge to the former one. Histories 
may be represented by labelled edges. These rules define de Bruijn graph of 
the order m. Examples for m = 1 and m = 2 are given in Figs dS 



Histories in MGs are not equiprobable [24j|. Among all paths on the 
de Bruijn graph of the game, Euler paths define the shortest sequence of his- 
tories where each strategy loses and wins equally likely. In the non-Eulerian 
paths some histories are more frequent and therefore some strategies are 
more profitable. We show in the following that in the efficient mode the 
non-Eulerian paths are rare compared to the Eulerian ones. 
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Figure 16: De Bruijn graphs of orders m = 1 (left) and m = 2 (right). Dashed lines 
represent examples of the Euler trails on the graph: one trail for m — 1 (left) and one of 
two possible Euler trails for m = 2 (right) 

4-3.4- Algorithm generating strong demand fluctuations 

We noticed that large fluctuation of A is only possible if the game is in 
one of two de Bruijn nodes called homogeneous, i.e. consisting of identical 
symbols: Hhi{2) — [ — (+)1> • • • , — (+)!]• Interesting enough, peaks are ob- 
servable only after one of the homogenous histories, but not after both. In 
Fig. [T7] we present the flow chart illustrating appearance of strong fluctu- 
ations of A(t). Below we describe the algorithm step by step. First three 
stages lead to the first peak. Next steps explain why the subsequent peaks 
follow each other and why they have opposite signs. 



If A(ti) stands for the first peak of demand then three prior conditions 
have to be fulfilled. The first is that ji{ti — 1) = /i/a(2), where Hhi(2) = 
[—(+)!, . . . , —(+)!] is a homogeneous node. 



It is also required that at t\ — 1 majority of agents decides to change 
the node. If this is fulfilled then the minority action is 



[ 1, yi{ti - 1) = li h 2 

Hence [iit\) = fi(ti — 1), the minority action is to stay in the same node 
and gives the positive payoff to the winning strategy 



Stage 1 



Stage 2 




(49) 



CtalMtl - !)■ 



(50) 
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Figure 17: The flow chart of the MG evolution algorithm, illustrating appearance of distinct 
peaks of demand. 



Stage 3 

There is a non-zero probability that strategies corresponding to the 
first 1/2 utilities in {u{\ have won in the last step. Such circumstance 
is possible provided stages 1 and 2 are realized. If this third condition 
is fulfilled then we mark such history \iq. Then all first 1/2 strategies 
suggest the same reaction after fi c . Hence the majority decision at 
t\ is to stay in the node and the maximal demand (cf. Eq. ( H6|) ) is 
generated. All strategies with high utility get the penalty and the low- 
utility ones are rewarded by the same amount. The game follows the 
minority decision and escapes from the de Bruijn node \xc- When the 
game leaves /ic, the strategy set is split into two groups of high and 
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low utility, as illustrated in Fig. [TSJ In the next steps the game goes to 




Figure 18: The time evolution of the utilities (left) and the aggregated demand (right) for 
the MG with N = 1601, S — 2, m = 2 and g{x) = x. Grey solid trajectories represent 
strategies reacting in the same way after particular history fj,c ■ Black dashed trajectories 
represent anti- correlated strategies reacting in the opposite way after [ic- The appearance 
of nc is in t = 105 and 106 and then after every 2 m+1 steps. 

Stage 4 

Next steps do not substantially affect utilities as long as the history \i c 
does not reappear. There is no history other than \xc assuring that the 
first 1/2 strategies in the {ui} list suggest a collective action resulting 
with the most spiky demand. Hence, after t\, the variations of A do not 
affect the utility significantly until the \ic reappears at t 2 > t\ and when 
the set of the best 1/2 strategies is the same as at t\. Then the 1/2 best 
strategies suggest the game to shift to another node characterized by 
history /x(i 2 + 1) 7^ A*c an d the maximal demand |A(t 2 )| = N(l — ^=r) 
is generated. All the 1/2 best strategies get penalty proportional to 
the absolute value of the aggregated demand. Concurrently, the 1/2 
strategies with the lowest utility are rewarded with the same amount 
(cf. Fig. IB). 

Stage 5 

Next, the game follows the edge leading to the same node. Subse- 
quently, the 1/2 best strategies suggest staying in the same vertex \i c . 
Again, high absolute value of demand is generated but the sign of 
A(t 2 + 1) is opposite to the sign of A(t 2 ). Consequently, all strategies 
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with high U(t 2 + 1) get penalty N(l — and, concurrently, strategies 
with low utility get reward of the same size. 



Stage 6 



The game goes to the vertex \ic{t>2 + 2) ^ \ic and stages 4-6 repeat. 

Since high A(t) appears only after the history fic, we have just two tran- 
sitions in the Eulerian path starting from this history. From this it follows 
that the frequency of peaks is equal to 



in agreement with our simulations. The value 2 m+1 is the length of the Euler 
path and it corresponds to the period of A observed in Figs. EE 

4-3.5. The Markov process representation 
The case of equal-size fractions 

As pointed out in Sec. 4.2, rewards and penalties have to compensate if the 
game is stable. This requires specific order of states (cycle), such that every fi 
has to appear twice over the cycle, in order to assure the same magnitudes of 
reward and penalty for any strategy. Such cycles are considered as attractors 
because, as we will see, they tend to pull in other initial states. The question 
is: how many attractors exist and how one can find them? At least two 
ways of dealing with the problem are possible for equal-size fractions. The 
first is a brute force method where for each state its successor is determined. 
But usefulness of this method is limited only to small m. Another approach 
requires analysis of the Euler paths on de Bruijn graph and is applicable for 
any m. We will show subsequently that the number of attractors is two times 
larger than the number of Euler paths. Below are examples of both methods. 

We present the brute force method for m — 1 and strategies defined as 
in Tab. [2j For simplicity, we use abridged notation for the state, e.g. —3412 
stands for [—1,3,4,1,2]. Each state has to be analyzed and its successor 
has to be found. Fig. [19] presents relations between states. There are two 
attractors: 



./ 



2 




(51) 



Attractor 1 
Attractor 2 



[+4231, -3142, -1324, +3142] 
[+4321, -3412, -1234, +3412] 



(52) 
(53) 
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Figure 19: Two basins of attraction for m = 1 (left). Attractors are marked by grey 
arrows. Both attractors are projected to the same Euler path in de Bruijn graph (right). 
For simplicity, we use abridged notation for the state, e.g. —3412 stands for [—1, 3, 4, 1, 2] . 



Both attractors are equally possible, provided that U(t = 0) = const for 
all strategies. Each attractor assures that every possible history appears 
twice. One appearance rewards half of strategies and another one penalizes 
them. Each reward and subsequent penalty are of the same magnitude. 
Moving along attractors assures that the game follows the Euler trail in the 
de Bruijn graph, consistently with results of Refs. 0, [24j. An example of U 
trajectories corresponding to these attractors is presented in Fig. [2UJ The 
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Figure 20: Utility trajectories for two possible attractors for m — 1. 



absolute changes of utilities in analyzed case m — 1 are equal to one of 
two values: N/2 or N/4, depending on state. In the former case, both best 
strategies suggest the same action. Thus the 3/4 of population acts according 
to these actions and an aggregate demand is equal to \A\ = |iV — |iV = y. 
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Figure 21: Time evolution of the aggregated demand A(t), estimated Pr(A) and 
Pr(sgn(A)) for the population size N — 400, agent memory m = 1, S = 2 strategies 
per agent, identical sizes of fractions (reference system) and linear payoff g{x) = x 

In the latter case, the first and the third strategy suggest the same action. 
Hence, the ^| of the population chooses the same action and consequently 
| A\ = j^N — yjjiV = ~. Exemplified realization for m — 1 is shown in Fig. [5U 
It is seen that both distributions, A and sgn(A), are symmetric. Since the 
game is fully deterministic, each of four states x\ . . . x± is related to only one 
value of A(x). Hence, the A distribution has four peaks. 

More general way to determine the number of attractors is to count the 
number of Eulerian paths in de Bruijn graph. Each attractor consists of the 
unique set of states that do not appear in other attractors. We proved in 
Sec. I4.3.4l that each attractor comprises of exactly one state characterized by 
the large oscillation \A\ = N(l — ks^t)- This state has to incorporate the 
H representing one of the two possible homogenous nodes of the de Bruijn 



7 A large oscillation is explicitly connected with a state characterized by half of best 
strategies suggesting the same action. 
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Figure 22: Time evolution of the aggregated demand A(t), plots of the estimated Pr(A) and 
Pr(sgn(A)) for the population size N — 400 and agent memory m = 1, S = 2 strategies 
per agent and unequal sizes of fractions. 




graph. As a consequence, there are two different states belonging to two dif- 
ferent attractors where both attractors are projected on the same Eulerian 
path in de Bruijn graph. According to the theory of de Bruijn sequences, 



there is 2 2m /2 m+1 Eulerian paths 25|. Hence, there is twice that many at- 
tractors, 2 2m /2 m , e.g. there are 2,4 and 32 attractors for m — 1,2 and 3, 
respectively 



The case of unequal-size fractions 

The size of different fractions most likely varies for strategies drawn randomly. 
This shifts the a posteriori A distribution with respect to that of the reference 
system. The mechanism is the same as for the steplike payoff. Consequently, 
in each state belonging to the attractor, the values of A are different than in 
the case of equal-size fractions. If the game follows an attractor, the A would 
not compensate to zero along the path and the utility values would grow or 
shrink indefinitely. The minority mechanism stabilizes the game and prevents 
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such scenario by adding states to the attractor. Exemplified realization for 
the case, where strategies are drawn from uniform distribution, is shown 
in Fig. |22j It is seen that both Pr(A) and Pr(sgn(v4)) are asymmetric if 
distributions are considered a posteriori. The comparison of Markov chains, 
where sizes of fractions are equal and different, is shown in Fig. |23j It is seen 
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Figure 23: Two possible attractors for m = 1 (left) for game with equal sizes of fractions. 
The transition graph for a real game where sizes are unequal, i.e. strategies are drawn 
from uniform distribution (right). 



that the game with unequal fractions mostly follows attractor 1 (red arrows) 
but in three of four states transitions to other states can appear either. The 
probability of these transitions is relatively small, indicating that sizes of 
fractions do not differ a lot. The MP representation for unequal fractions is 
different for each realization. 



4-3.6. The variance per capita o~ 2 /N 

We proved in Sec. 14.3.41 that large oscillations are periodic and equal to: 



\A\ = N 1 - 



2 s - 



(54) 



In particular, if S 
results of Ref. 17 . 



2 then a 2 
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consistently with observations and 



46 



The argumentation in Sec. 14.3.41 becomes strict and Eq. (146 j) is exact in 
the efficient mode when NS ^> 2 P , ideally in the limit NS — > oo. But we also 
observe cyclic peaks of demand for N = 1601 and m = 5, when the efficiency 
condition is not met (cf. Fig. right). In fact, the condition NS ^> 2 P can 
be slacken off to the requirement that the population is numerous enough 
that the game is in the herd mode. Games in that mode do not follow 
Eulerian paths because for smaller N the pool of strategies is too sparse and 
some histories occur more frequently. Nevertheless, the mechanism of peak 
creation is approximately preserved, as long as N is large enough to cause the 
split of utilities into two groups. At any time a somewhat simpler explanation 
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Figure 24: The time evolution of the aggregated demand (upper left) and utilities for 
three cases: an agent with one high- and one low-utility strategy (upper right), two low- 
utility strategies (lower left) and two high-utility strategies (lower right) at t = 1000. 
These three cases may be quantitatively distinguished using the values of utilities at t = 
1000, corresponding to the location of the first maximum of A(t) in the upper left panel. 
Simulation was performed for the MG with N = 1601, S = 2, m = 5 and g(x) = x. 
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of large oscillations may be given by dividing strategies into two categories: 
the good with the positive payoff, and bad with negative. Probability that 
an agent has no good strategies, or at least one good, is equal to and 
1 — jrg-, respectively. Rapid fluctuations of demand are transferred to similar 
fluctuations of the utility. The A(t{) fluctuates after the history \ic = (J>{ti) 
when the strategies with higher utility indicate identical actions. If A(t\) 
strongly fluctuates, then at t\ + 1 about N(l — ^) agents have at least 
one strategy with high utility and they choose it. Strategies split into two 
groups: the first group consisting of high utilities and the second of low 
utilities, with a gap between these two groups (cf. Fig. US]) . Strategies 
with utilities belonging to the same group do not suggest the same actions, 
provided \i ^ \lc, and therefore no peak of A is generated. The \ic has a 
non- vanishing probability to reappear at some t% >t\. All agents belonging 
to the group with at least one high-utility strategy tend to react identically 
and A{t-i) fluctuates maximally, i.e. Ait^) = N(l — ^=r). This is illustrated 
in Fig. M (upper left), where for S = 2 we have A(t = 1000) = f . At 
t 2 , all strategies with high Ufa) fail and get the penalty —Afo), whereas 
those with low {7(^2 ) are rewarded with Afe). After t\ agents are divided 
into three groups, provided S = 2: the group with two good strategies, with 
one good and one bad, and with two bad. As seen in Fig. [22J at t = 1000 
a quarter of the population with two high-utility strategies evolves into two 
low-utility groups (lower left), and vice versa for another quarter with two 
initially low-utility strategies (lower right). Remaining half of the population 
just swaps utilities of their strategies (upper right). 

Results showing periodicity of A(t) from simulations become closer to the 
theoretical results for large NS/2 P ratio. If this ratio is small, then the game 
hardly follows the Eulerian path and peaks of A(t) appear randomly. 

4-3.7. Stability of the game and behavior of the predictability H 

The behavior of Ha is driven by absolute disproportions between frac- 
tions' sizes. The payoff is an explicit function of A and, in order to stabilize 
the game, the negative and positive payoffs following the same \x have to 
compensate mutually. Hence, for any //: (A~\n) = (A + \fi) and (Ha) = 0. 
For this kind of payoff the same frequency of the negative and positive payoffs 
do not have to be preserved as it is required for sgn(x) (see Fig. [251 bottom 
left). 

The last point to understand is the plot of H a /N that seems to be equal to 
zero in the herd regime. The H a is the sum of (a*\fi) over P different /x's. Each 
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of these components is most likely nonzero and is bounded: (a*\[i) G [—1, 1]. 
Thus max(H a ) = const = P and in the limit iV — > oo one has H a /N = 0. 

4-4- The effect of imbalance between fractions 

One can try to measure how the size of disproportion between fractions 
affects transition probabilities in the Markov chain. To this end we incor- 
porate a measure of the distance between two arbitrary processes. Denoting 
the set of reference processes by 1Z and the set of examined ones by 8, this 
measure is defined as 

T= Yl \ Pr£ (xi) Pr£ (xMi) - Pr n {x i )Pr' !l (x j \x i )\. (55) 

where Pr n and Pr stand for the probabilities for the reference and examined 
system. The T e [0, 2] is suitable to compare any MPs, comprising even such 
where processes are based on different sets of states. If T = 0, then there 
are no differences between processes. If T = 2, then processes are based 
on strongly disjunctive sets of states. The standard deviation o~(F) is a 
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Figure 25: Distance T as a function of a(F) for two different payoffs g(x) — sgn(x) (left) 
and g(x) — x (right). 

measure of disproportion of fractions. The Upsilon as a function of cr(F) 
is presented in Fig. [25j The left panel presents T measured between the 
reference MP (left-hand diagram in Fig. [T3l) and 40 games where strategies 
are drawn from various distributions, provided g(x) = sgn(x). In the case 
g(x) = x, the function is more complicated because we do not have just 
one MP representing the reference system but for m = 1 there are two 
equiprobable attractors, corresponding to two MPs. Therefore we use the 
sum of Ti + T 2 as a function of imbalance between fractions, as presented 
in Fig. [25] (right). If the game follows attractor 1 then Yj. = and Y 2 = 2. 
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5. Conclusions 

In this paper we proposed a consistent, reductionist scheme explaining 
phenomenology of minority games in the efficient regime. In this mode the 
size of strategy space is much smaller than the number of strategies used by 
agents and the population as a whole can access complete information about 
the game. 

Our discussion begun with the phenomenology. We considered a number 
of macroscopic random variables, or their moments, characterizing the game 
and being particularly important for applications, such as the aggregated de- 
mand, demand's variance per capita and decision's or demand's predictabil- 
ities. We studied these variables as functions of the control parameters, e.g. 
the ratio of the total number of agents to the number of all possible win- 
ning histories, as well as their time evolution. Among interesting features we 
found that predictabilities may, or may not, be sensitive to the form of the 
payoff function, depending on how the predictabilities are actually defined: 
using winning decisions only or the overall demand. 

Deeper insight into the mechanism of these behaviors was possible by per- 
forming coarse-graining and aggregation of some internal degrees of freedom 
of the game, thus defining an intermediate level of description, called meso- 
scopic. At such mesoscopic level, fractions of agents using same strategies 
are treated as separate entities. Using this method, in the efficient regime 
when NS ^> 2 P , we also managed to represent the game as a Markov process 
with the finite number of states. 

In case where the Markov representation is known, two methodologies 
were proposed to explain our observations. First, in the simplified case, the 
quenched disorder was neglected, i.e. fractions were assumed to be equal size. 
In this case, however, not all observations are properly explained. Behavior 
of predictability required extended methodology where the quenched disorder 
was used. Two payoffs, the steplike and linear, were separately analyzed. We 
showed that in case of the steplike payoff, the stochastic and deterministic 
transitions were possible, whereas for the linear payoff, all transitions were 
deterministic. 

We argued that the Markov process representation of the game completely 
defines and explains the dynamics of the game in the stationary regime, and 
allows for the calculation of state occupancies. If the transition probabilities 
in the Markov chain are known, the phenomenology also becomes under- 
standable. For example, the Markov representation provides an explanation 
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of the periodicity and preferred levels of the aggregate demand A(t). In 
practical terms, this approach is tough for m > 1 due to the large number 
of states but the whole reasoning remains valid in general. We failed to find 
any relation between the memory length m and the total number of states. 

Neither the simplified concept of state nor the Markov process descrip- 
tion seem to be correct if the initial preference was given to any strategy. 
The definition of the stability was introduced in Sec. 14.1.3} in order to bet- 
ter understand asymmetries observed for aggregated variables. The stability 
mechanism appeared to be sensitive to the payoff function. In case the step- 
like payoff was considered, then the frequency of opposite signs of A after 
any fi had to be preserved. In case of the linear payoff, the negative and 
positive values of A had to compensate mutually. As a result, depending on 
the payoff, both the H a and Ha were equal to zero in the herd regime. 

Differences between systems with equal-size fractions and those with 
unequal-size fractions were more likely for larger numbers of agents. This 
was particularly reflected in distortions of: (i) the transition probabilities, in 
case of the steplike payoff, and (ii) attractors, in case of the linear payoff. 
In order to quantify this distortion, the measure of distance between two 
Markov processes was introduced. 

We studied games with the full, maximal strategy space. Some authors, 
e.g. in Refs. 0, [26[, reduced the strategy space and reproduced many fea- 
tures of the full MG, e.g. behavior of variance per capita. The drawback of 
their method is that it reduces the number of states in the Markov-chain de- 
scription of the game and significantly affects its time evolution. The Markov 
representation is oversimplified by such reduction. 

Some observables for the proportional payoff were explained without using 
the Markov process. For example, there was an observation of distinct peaks 
of the aggregated demand, exhibiting height equal to a half of the population, 
assuming S = 2. We showed that in the herd regime, there always exists a 
history \ic for which the fraction 1 — of agents reacts identically and 
this is seen in the peak A(t) = N(l — ^t)- Apart from using the Markov 
chain technique, we found another, simpler one, where only two classes of 
strategies were used instead of all 2 P classes. This technique is not limited 
to the case NS ^> 2 P but works in the whole herd regime. This approach 
was also^successfully exploited in our analysis of the multi-market minority 
game 
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Considering further research, it could be a significant achievement if a 
single, closed-form equation were found for the entire parameter region of 
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the MG. So far, in literature, such equations have indeed been found in the 
crowded and random regions separately. But a unified description of the 
entire range of parameters is still lacking. 
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